Automatically Derived Discourse Segmentation Algorithms Based on Acoustic-Prosodic Features
نویسنده
چکیده
We describe an investigation aimed at furthering the understanding of how speakers communicate discourse structural information using intonational features. We used the read and spontaneous speech of two speakers from the Boston Directions Corpus (BDC) to automatically identify elements of discourse structure based on intonational features. Unlike previous acoustic-prosodic analyses of discourse corpora, we used discourse segmentations produced by both naïve and expert labelers and analyzed more than a single speaker. We extended previous work by using machine learning algorithms to automate the construction of models that identify discourse segment initial (SBEG), final (SF), and medial (SCONT) phrases. Though the models used a variety of features for classifying all three elements, general trends exist. The length of the preceding pause and the change in the maximum fundamental frequency (f0) from the previous phrase were the most informative features for classifying SBEG phrases. For SF phrases, important features included the duration of the subsequent pause and f0 average. Finally, SCONT phrases were more difficult to classify automatically. Their classification often involved constraints on the duration of the preceding and subsequent pauses and the change in f0 maximum from the previous phrase. The learned models provide further insight into how speakers use acoustic-prosodic properties to convey information about the structure of discourse. A firmer understanding of the relationship between acoustic-prosodic cues and discourse structure would lead to improvements in systems for interpreting and generating dialogues and in applications such as text-to-speech synthesis.
منابع مشابه
Developing Algorithms for Discourse Segmentation
The structuring of discourse into multi-utterance segments has been claimed to correlate with linguistic phenomena such as reference, prosody, and the distribution of pauses and cue words. We discuss two methods for developing segmentation algorithms that take advantage of such correlations, by analyzing a coded corpus of spoken narratives. The coding includes a linear segmentation derived from...
متن کاملThe effect of bilateral subthalamic nucleus deep brain stimulation (STN-DBS) on the acoustic and prosodic features in patients with Parkinson’s disease: A study protocol for the first trial on Iranian patients
Background: The effect of subthalamic nucleus deep brain stimulation (STN-DBS) on the voice features in Parkinson’s disease (PD) is controversial. No study has evaluated the voice features of PD underwent STN-DBS by the acoustic, perceptual, and patient-based assessments comprehensively. Furthermore, there is no study to investigate prosodic features before and after DBS in PD. The curren...
متن کاملProsodic boundary information helps unsupervised word segmentation
It is well known that prosodic information is used by infants in early language acquisition. In particular, prosodic boundaries have been shown to help infants with sentence and wordlevel segmentation. In this study, we extend an unsupervised method for word segmentation to include information about prosodic boundaries. The boundary information used was either derived from oracle data (handanno...
متن کاملAssessing Prosodic And Text Features For Segmentation Of Mandarin Broadcast News
Automatic topic segmentation, separation of a discourse stream into its constituent stories or topics, is a necessary preprocessing step for applications such as information retrieval, anaphora resolution, and summarization. While significant progress has been made in this area for text sources and for English audio sources, little work has been done in automatic segmentation of other languages...
متن کاملCombining Prosodic and Text Features for Segmentation of Mandarin Broadcast News
Automatic topic segmentation, separation of a discourse stream into its constituent stories or topics, is a necessary preprocessing step for applications such as information retrieval, anaphora resolution, and summarization. While significant progress has been made in this area for text sources and for English audio sources, little work has been done in automatic, acoustic feature-based segment...
متن کامل